AITopics | piano roll

Collaborating Authors

piano roll

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning and composing of classical music using restricted Boltzmann machines

Kobayashi, Mutsumi, Watanabe, Hiroshi

arXiv.org Artificial IntelligenceDec-1-2025

We investigate how machine learning models acquire the ability to compose music and how musical information is internally represented within such models. We develop a composition algorithm based on a restricted Boltzmann machine (RBM), a simple generative model capable of producing musical pieces of arbitrary length. We convert musical scores into piano-roll image representations and train the RBM in an unsupervised manner. We confirm that the trained RBM can generate new musical pieces; however, by analyzing the model's responses and internal structure, we find that the learned information is not stored in a form directly interpretable by humans. This study contributes to a better understanding of how machine learning models capable of music composition may internally represent musical structure and highlights issues related to the interpretability of generative models in creative tasks.

artificial intelligence, machine learning, rbm, (18 more...)

arXiv.org Artificial Intelligence

2509.04899

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)

Add feedback

Fine-Tuning MIDI-to-Audio Alignment using a Neural Network on Piano Roll and CQT Representations

Murgul, Sebastian, Reiser, Moritz, Heizmann, Michael, Seibert, Christoph

arXiv.org Artificial IntelligenceJun-30-2025

In this paper, we present a neural network approach for synchronizing audio recordings of human piano performances with their corresponding loosely aligned MIDI files. The task is addressed using a Convolutional Recurrent Neural Network (CRNN) architecture, which effectively captures spectral and temporal features by processing an unaligned piano roll and a spectrogram as inputs to estimate the aligned piano roll. To train the network, we create a dataset of piano pieces with augmented MIDI files that simulate common human timing errors. The proposed model achieves up to 20% higher alignment accuracy than the industry-standard Dynamic Time Warping (DTW) method across various tolerance windows. Furthermore, integrating DTW with the CRNN yields additional improvements, offering enhanced robustness and consistency. These findings demonstrate the potential of neural networks in advancing state-of-the-art MIDI-to-audio alignment.

artificial intelligence, machine learning, piano roll, (17 more...)

arXiv.org Artificial Intelligence

2506.22237

Country:

Europe > Germany (0.16)
North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

Pasquier, Philippe, Ens, Jeff, Fradet, Nathan, Triana, Paul, Rizzotti, Davide, Rolland, Jean-Baptiste, Safi, Maryam

arXiv.org Artificial IntelligenceFeb-4-2025

We present and release MIDI-GPT, a generative system based on the Transformer architecture that is designed for computer-assisted music composition workflows. MIDI-GPT supports the infilling of musical material at the track and bar level, and can condition generation on attributes including: instrument type, musical style, note density, polyphony level, and note duration. In order to integrate these features, we employ an alternative representation for musical material, creating a time-ordered sequence of musical events for each track and concatenating several tracks into a single sequence, rather than using a single time-ordered sequence where the musical events corresponding to different tracks are interleaved. We also propose a variation of our representation allowing for expressiveness. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material. We also outline several real-world applications of MIDI-GPT, including collaborations with industry partners that explore the integration and evaluation of MIDI-GPT into commercial products, as well as several artistic works produced using it.

machine learning, midi-gpt, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.17011

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > Germany > Hamburg (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

D3RM: A Discrete Denoising Diffusion Refinement Model for Piano Transcription

Kim, Hounsu, Kwon, Taegyun, Nam, Juhan

arXiv.org Artificial IntelligenceJan-13-2025

Diffusion models have been widely used in the generative domain due to their convincing performance in modeling complex data distributions. Moreover, they have shown competitive results on discriminative tasks, such as image segmentation. While diffusion models have also been explored for automatic music transcription, their performance has yet to reach a competitive level. In this paper, we focus on discrete diffusion model's refinement capabilities and present a novel architecture for piano transcription. Our model utilizes Neighborhood Attention layers as the denoising module, gradually predicting the target high-resolution piano roll, conditioned on the finetuned features of a pretrained acoustic model. To further enhance refinement, we devise a novel strategy which applies distinct transition states during training and inference stage of discrete diffusion models. Experiments on the MAESTRO dataset show that our approach outperforms previous diffusion-based piano transcription models and the baseline model in terms of F1 score. Our code is available in https://github.com/hanshounsu/d3rm.

diffusion model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.05068

Country: Asia > South Korea (0.15)

Genre: Research Report (0.64)

Industry: Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

MidiTok Visualizer: a tool for visualization and analysis of tokenized MIDI symbolic music

Wiszenko, Michał, Stefański, Kacper, Malesa, Piotr, Pokorzyński, Łukasz, Modrzejewski, Mateusz

arXiv.org Artificial IntelligenceOct-27-2024

Symbolic music research plays a crucial role in musicrelated machine learning, but MIDI data can be complex 2. SOFTWARE OVERVIEW for those without musical expertise. To address this issue, 2.1 Key functionality we present MidiTok Visualizer, a web application designed to facilitate the exploration and visualization of various MidiTok Visualizer is a web application designed for visualizing MIDI tokenization methods from the MidiTok Python and analyzing MIDI file tokenization techniques package. MidiTok Visualizer offers numerous customizable from the MidiTok Python package. The key capabilities parameters, enabling users to upload MIDI files to visualize of the tool are as follows: tokenized data alongside an interactive piano roll. Allows users to upload a MIDI file and view a graphical representation of the tokens generated by 1. INTRODUCTION

artificial intelligence, miditok visualizer, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.20518

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > New York > New York County > New York City (0.05)
Europe > Poland > Masovia Province > Warsaw (0.05)
(2 more...)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.99)

Add feedback

Symbolic Music Generation with Fine-grained Interactive Textural Guidance

Zhu, Tingyu, Liu, Haoyu, Jiang, Zhimin, Zheng, Zeyu

arXiv.org Artificial IntelligenceOct-10-2024

The problem of symbolic music generation presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To overcome these difficulties, we introduce Fine-grained Textural Guidance (FTG) within diffusion models to correct errors in the learned distributions. By incorporating FTG, the diffusion models improve the accuracy of music generation, which makes them well-suited for advanced tasks such as progressive music generation, improvisation and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effect of the FTG approach. We provide numerical experiments and a demo page for interactive music generation with user input to showcase the effectiveness of our approach.

diffusion model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.08435

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Huang, Yujia, Ghatare, Adishree, Liu, Yuanzhe, Hu, Ziniu, Zhang, Qinsheng, Sastry, Chandramouli S, Gururani, Siddharth, Oore, Sageev, Yue, Yisong

arXiv.org Artificial IntelligenceJun-2-2024

We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.

guidance, music, note density, (11 more...)

arXiv.org Artificial Intelligence

2402.14285

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California (0.04)

Genre: Research Report (0.63)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploring Latent Spaces of Tonal Music using Variational Autoencoders

Carvalho, Nádia, Bernardes, Gilberto

arXiv.org Artificial IntelligenceNov-6-2023

Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings -- Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions -- in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL-divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra- and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability -- i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.

latent space, piano roll, representation, (14 more...)

arXiv.org Artificial Intelligence

2311.03621

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Content-based Controls For Music Large Language Modeling

Lin, Liwei, Xia, Gus, Jiang, Junyan, Zhang, Yixiao

arXiv.org Artificial IntelligenceOct-26-2023

Recent years have witnessed a rapid growth of large-scale language models in the domain of music audio. Such models enable end-to-end generation of higher-quality music, and some allow conditioned generation using text descriptions. However, the control power of text controls on music is intrinsically limited, as they can only describe music indirectly through meta-data (such as singers and instruments) or high-level representations (such as genre and emotion). We aim to further equip the models with direct and content-based controls on innate music languages such as pitch, chords and drum track. To this end, we contribute Coco-Mulla, a content-based control method for music large language modeling. It uses a parameter-efficient fine-tuning (PEFT) method tailored for Transformer-based audio models. Experiments show that our approach achieved high-quality music generation with low-resource semi-supervised learning, tuning with less than 4% parameters compared to the original model and training on a small dataset with fewer than 300 songs. Moreover, our approach enables effective content-based controls, and we illustrate the control power via chords and rhythms, two of the most salient features of music audio. Furthermore, we show that by combining content-based controls and text descriptions, our system achieves flexible music variation generation and style transfer. Our source codes and demos are available online.

arxiv preprint arxiv, content-based control, transformer decoder, (13 more...)

arXiv.org Artificial Intelligence

2310.17162

Country:

North America > United States > New York (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Generating symbolic music using diffusion models

Atassi, Lilac

arXiv.org Artificial IntelligenceMay-15-2023

Denoising Diffusion Probabilistic models have emerged as simple yet very powerful generative models. Unlike other generative models, diffusion models do not suffer from mode collapse or require a discriminator to generate high-quality samples. In this paper, a diffusion model that uses a binomial prior distribution to generate piano rolls is proposed. The paper also proposes an efficient method to train the model and generate samples. The generated music has coherence at time scales up to the length of the training piano roll segments. The paper demonstrates how this model is conditioned on the input and can be used to harmonize a given melody, complete an incomplete piano roll, or generate a variation of a given piece. The code is publicly shared to encourage the use and development of the method by the community.

artificial intelligence, machine learning, piano roll, (18 more...)

arXiv.org Artificial Intelligence

2303.08385

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.50)

Industry:

Media > Music (0.67)
Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback